Skip to content

Conversation

@NaccOll
Copy link
Contributor

@NaccOll NaccOll commented Aug 24, 2025

Related GitHub Issue

Closes: #7330

Description

Optimizing listCodeDefinitionNamesTool for object-oriented design language.

Tree-sitter has already parsed the Java grammar. However, there are differences between the listCodeDefinitionNamesTool and the test. The listCodeDefinitionNamesTool function filters out blocks with fewer than four lines, while the test displays all content.

The original design was to avoid returning small, meaningless blocks, but this doesn't fit with object-oriented design language.

Java is an object-oriented language, and projects often contain numerous interfaces. Implementation classes have @Override methods that override methods in interfaces or base classes, and rely on annotations to implement AOP.

The previous approach resulted in numerous method names being ignored or only one of multiple annotations being displayed, leading to LLM misinterpreting the code.

The issue was resolved by follow:

  1. ignore line count checks for definition.method
  2. improve java query to get both method signature and @annotations

Test Procedure

change src\services\tree-sitter_tests_\parseSourceCodeDefinitions.java.spec.ts and src\services\tree-sitter_tests_\fixtures\sample-java.ts

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

before:
image

after:
image

Documentation Updates

No documentation updates are required.

Additional Notes

Get in Touch

NaccOll


Important

Enhances processCaptures in index.ts to improve Java method parsing by skipping line count checks and annotations, with updated tests.

  • Behavior:
    • processCaptures in index.ts now skips line count checks for Java definition.method and skips annotations.
    • Introduces shouldSkip and skipJavaAnnotations functions in index.ts for Java-specific handling.
  • Tests:
    • Updates parseSourceCodeDefinitions.java.spec.ts to test new Java parsing behavior.
    • Modifies sample-java.ts to include more Java method cases with annotations.
  • Misc:
    • Removes minComponentLines parameter from processCaptures in index.ts.

This description was created by Ellipsis for f55baa6a5679ec89bf3cec23ade127aeceada382. You can customize this summary. It will automatically update as commits are pushed.

@NaccOll NaccOll requested review from cte, jr and mrubens as code owners August 24, 2025 17:48
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Aug 24, 2025
@NaccOll NaccOll changed the title enhance processCaptures function to accept minComponentLines parameter for better flexibility. FIX 7330 Optimizing listCodeDefinitionNamesTool for java.. FIX 7330 Aug 24, 2025
@NaccOll NaccOll changed the title Optimizing listCodeDefinitionNamesTool for java.. FIX 7330 Optimizing listCodeDefinitionNamesTool for java. FIX 7330 Aug 24, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the changes to improve Java method parsing in the listCodeDefinitionNamesTool. The approach to skip line count checks for Java methods and handle annotations is good, but I found some issues that need attention before merging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test fixture modifications compress multi-line method declarations into single lines. Could this reduce test coverage for real-world Java code formatting? Consider keeping some multi-line examples to ensure the parser handles various formatting styles correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it actually expands to four lines just to accommodate currentMinComponentLines=4

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 24, 2025
@NaccOll NaccOll force-pushed the hotfix-tree-sitter-java branch from b491b13 to 8970f77 Compare August 24, 2025 18:12
@NaccOll
Copy link
Contributor Author

NaccOll commented Aug 25, 2025

@roomote-agent

Please review this pull request.

  1. Does it complete the objectives of the issue?
  2. It should not affect any functionality other than Java methods.

@roomote
Copy link
Contributor

roomote bot commented Aug 25, 2025

Hi @NaccOll! I've seen your review request and I'm now reviewing this PR. I'll check if it properly addresses issue #7330 and verify that it only affects Java method handling as requested. I'll provide detailed feedback shortly.

@roomote
Copy link
Contributor

roomote bot commented Aug 25, 2025

Pull Request Review Complete ✅

Hi @NaccOll! I have completed my review of this PR. Here are my findings:

1. Does it complete the objectives of issue #7330? ✅ YES

The PR successfully addresses both issues mentioned in #7330:

  • Fixed: Java interface methods (even single-line ones) are now properly recognized instead of being filtered out
  • Fixed: Java methods with @Override annotations now show the actual method name instead of just "@OverRide"

2. Does it only affect Java method handling? ✅ YES

The changes are properly scoped to Java-specific handling:

Key Changes:

  1. Language-specific skip rules: Added LANGUAGE_SKIP_RULES configuration that currently only affects Java definition.method captures

  2. Smart line detection for Java methods: New findJavaMethodSignatureLine() function that:

    • Skips annotations (like @Override, @Test) to find the actual method signature
    • Uses optimized regex patterns to avoid backtracking issues
    • Falls back gracefully if no signature is found
  3. Conditional line count checking: The shouldSkipLineCountCheck() function ensures that Java methods are exempt from the 4-line minimum requirement while other languages remain unaffected

Code Quality Assessment:

Strengths:

  • ✅ Clean, modular implementation with language-specific handling
  • ✅ Extensible design - other languages can easily be added to LANGUAGE_SKIP_RULES
  • ✅ Performance-conscious regex patterns that avoid catastrophic backtracking
  • ✅ Comprehensive test coverage with 12 passing test cases
  • ✅ All existing tests (304 tests across 60 test files) continue to pass

Minor Suggestions for Future Improvements:

  • Consider adding a comment explaining why Java methods need special handling (OOP nature, interfaces, annotations)
  • The regex patterns in findJavaMethodSignatureLine() could potentially be extracted as constants for better maintainability

Test Results:

✓ All Java-specific tests pass (12/12)
✓ All tree-sitter tests pass (304/304)
✓ No regression in other language parsers

Conclusion:

This PR effectively solves the reported issues without introducing any side effects. The implementation is clean, well-tested, and properly scoped to Java. The approach of using language-specific rules is a good architectural decision that will make it easy to handle similar issues in other languages if they arise.

Recommendation: APPROVE

Great work on this fix! The solution elegantly handles the unique characteristics of Java code while maintaining backward compatibility.

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Aug 26, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 26, 2025
@daniel-lxs
Copy link
Member

Hey @NaccOll thank you for your contribution! Can this problem be solved by removing the minimum line check for all the languages? I would prefer a general fix rather than creating langauge-specific parsing functions.

Let me know what you think!

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Changes Requested] in Roo Code Roadmap Aug 27, 2025
@NaccOll
Copy link
Contributor Author

NaccOll commented Aug 28, 2025

Hey @NaccOll thank you for your contribution! Can this problem be solved by removing the minimum line check for all the languages? I would prefer a general fix rather than creating langauge-specific parsing functions.

Let me know what you think!

There are two problem here.

  1. For object-oriented languages, the listCodeDefinitionNamesTool always ignores their interface definitions due to the minimum line check.

Because interface definitions often only have method signatures and no implementation, they are only one line. This issue can be addressed with the following method:

if (["definition.method"].includes(capture.name)) {
return false
}

I only did this for Java because I wanted to focus on resolving the issue mentioned.

  1. The first line of a method definition is not the method signature itself.

For Java, I modified the syntax capture expression to capture both annotations and method signatures.(by definition.method.start)

Please review again.

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Aug 28, 2025
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Sep 5, 2025
@NaccOll NaccOll force-pushed the hotfix-tree-sitter-java branch from 0e24e97 to 59a3dc9 Compare September 5, 2025 06:42
@NaccOll NaccOll requested a review from daniel-lxs September 5, 2025 14:58
@daniel-lxs daniel-lxs moved this from PR [Changes Requested] to PR [Needs Review] in Roo Code Roadmap Sep 5, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Review] to PR [Needs Prelim Review] in Roo Code Roadmap Sep 5, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I've reviewed the changes and while they do address the issue, I'm wondering if there's a simpler approach we could consider.

The PR description mentions that "The original design was to avoid returning small, meaningless blocks, but this doesn't fit with object-oriented design language." Since the core issue is that the 4-line minimum threshold filters out legitimate Java interface methods and single-line methods with annotations, wouldn't it be simpler to just reduce the DEFAULT_MIN_COMPONENT_LINES_VALUE from 4 to 1 or 2 globally?

This would:

  1. Solve the Java interface method issue (as shown in issue #7330 where TestInterface methods were being filtered out)
  2. Avoid adding the complexity of special-casing methods with the new OutputItem type and deduplication logic
  3. Keep the codebase simpler and more maintainable
  4. Still filter out truly empty or meaningless blocks (0 lines)

What do you think about this simpler approach? It seems like it would achieve the same goal without the added complexity of special handling for method captures.

@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Needs Review] in Roo Code Roadmap Sep 11, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Review] to PR [Changes Requested] in Roo Code Roadmap Sep 11, 2025
@NaccOll
Copy link
Contributor Author

NaccOll commented Sep 12, 2025

The PR description mentions that "The original design was to avoid returning small, meaningless blocks, but this doesn't fit with object-oriented design language." Since the core issue is that the 4-line minimum threshold filters out legitimate Java interface methods and single-line methods with annotations, wouldn't it be simpler to just reduce the DEFAULT_MIN_COMPONENT_LINES_VALUE from 4 to 1 or 2 globally?

First, changing DEFAULT_MIN_COMPONENT_LINES_VALUE to 1 only resolves signature retrieval for methods with only three lines. To address the issue of one-line methods in interface classes in object-oriented design languages, changing DEFAULT_MIN_COMPONENT_LINES_VALUE to 0 is necessary. However, changing DEFAULT_MIN_COMPONENT_LINES_VALUE to 0 results in returning small, meaningless blocks.

Please allow me to further explain a few key changes.

  1. The reason for skipping line count checks only for definition.method and definition.method.start is to return the method signatures in the interface class while avoiding returning meaningless lines of other syntaxes, such as various imports and single-line comments.
  2. The Java query was modified to retrieve the corresponding method signature when an annotation is present in the Java method.
  3. The Java query change causes method signatures to be displayed twice when the Java method does not have an annotation. The deduplication logic is designed to address this issue. However, in a subsequent pull request, I will implement this deduplication logic for all syntaxes. I've discovered that each language has its own duplication issues.

The image directly is more intuitive.

Change DEFAULT_MIN_COMPONENT_LINES_VALUE to 1
image
image

You can see that changing it to 1 has no effect

Change DEFAULT_MIN_COMPONENT_LINES_VALUE to 0
image
image
image

You can see that after changing it to 0, the methods of the interface class can be obtained, but the methods with annotations in the implementation class still cannot be obtained, and many meaningless blocks are introduced.

If I change DEFAULT_MIN_COMPONENT_LINES_VALUE to 0 and then modify the Java query, I can get the annotated method signatures of the implementation class, but there are still a lot of meaningless blocks, and the non-annotated method signatures appear twice, even though this only requires changing two lines of code.

image image

If you think the deduplication logic is too complex and meaningless, I can remove the third point in the key changes. However, it makes sense to modify the Java query and skip the row check only for the method. Please let me know what you think.

@NaccOll NaccOll requested a review from daniel-lxs September 12, 2025 02:41
@daniel-lxs daniel-lxs moved this from PR [Changes Requested] to PR [Needs Prelim Review] in Roo Code Roadmap Sep 18, 2025
@daniel-lxs daniel-lxs closed this Sep 22, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Sep 22, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 22, 2025
Copy link
Member

@daniel-lxs daniel-lxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architectural concern: this PR introduces method-specific logic in the generic tree-sitter pipeline (src/services/tree-sitter/index.ts), which appears Java-driven. I’m okay with Java-specific adjustments in the Java query, but I’d like to avoid central special-casing here. I think the best solution would be a combination of fixing for the min line requirement for all languages and updating the java query.

currentMinComponentLines = value
}

function passesMinLines(lineCount: number, capture: QueryCapture, language: string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method-specific bypass (added to handle Java) brings language concerns into the generic processor. Can we keep language-specific handling in queries (e.g., java.ts) or adopt a simpler language-agnostic rule, rather than special-casing in index.ts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this. In the latest code, I did not introduce any language-specific processing logic in index.ts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PR - Needs Preliminary Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

listCodeDefinitionNamesTool does not recognize method names in java file

3 participants